Skip to content

Conversation

erlan-z
Copy link
Contributor

@erlan-z erlan-z commented Aug 6, 2025

What this PR does:
Introduced RequestSource metadata to distinguish between API and ruler queries

Description:
Introduced RequestSource metadata to distinguish between API and ruler queries and propagates through services.

We previously had this source parsed from the "User-Agent" header which was only available in QFE. To extend this to other services, and to have unified experience across services, this PR changes that to keep source in context similar to RequestId.

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@dosubot dosubot bot added the component/rules Bits & bobs todo with rules and alerts: the ruler, config service etc. label Aug 6, 2025
@erlan-z erlan-z force-pushed the resource-based-throttling-reject-only-addhoc-queries branch from 5e39a8b to 53cedc7 Compare August 6, 2025 16:22
@erlan-z erlan-z changed the title resource based throttling: reject only addhock queries resource based throttling: reject only adhoc queries Aug 6, 2025
@erlan-z erlan-z force-pushed the resource-based-throttling-reject-only-addhoc-queries branch from 53cedc7 to 47da956 Compare August 6, 2025 16:25
Copy link
Contributor

@justinjung04 justinjung04 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@@ -2262,7 +2263,7 @@ func (i *Ingester) trackInflightQueryRequest() (func(), error) {

i.maxInflightQueryRequests.Track(i.inflightQueryRequests.Inc())

if i.resourceBasedLimiter != nil {
if i.resourceBasedLimiter != nil && requestmeta.RequestSourceFromContext(ctx) == requestmeta.SourceApi {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: what do you think of adding helper functions in requestmeta itself? ie. requestmeta.RequestFromApi(ctx) and requestmeta.RequestFromRuler(ctx)

@erlan-z erlan-z force-pushed the resource-based-throttling-reject-only-addhoc-queries branch from 47da956 to 8bb3d4f Compare August 8, 2025 18:19
Copy link
Contributor

@justinjung04 justinjung04 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you

Comment on lines 7 to 10
const (
SourceApi = "api"
SourceRuler = "ruler"
)
Copy link
Member

@SungJin1212 SungJin1212 Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about reusing tripperware.SourceAPI and tripperware.SourceRuler?

@pull-request-size pull-request-size bot added size/L and removed size/M labels Aug 15, 2025
@erlan-z erlan-z force-pushed the resource-based-throttling-reject-only-addhoc-queries branch from bfdbdd3 to 7cfa79a Compare August 15, 2025 20:21
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a feature that intends to protect storage layer, I am not sure if it is the best way to hardcode it by differentiating source of requests and only rejecting one type but not others. I understand the use case behind this change. But if we go with this, it opens a door that we may need to extend and support throttling based on other types of source such as user agent, dashboard ID, etc.

I think a better approach is to go with #6922 and throttle heavy queries no matter its source.

@erlan-z
Copy link
Contributor Author

erlan-z commented Aug 16, 2025

As a feature that intends to protect storage layer, I am not sure if it is the best way to hardcode it by differentiating source of requests and only rejecting one type but not others. I understand the use case behind this change. But if we go with this, it opens a door that we may need to extend and support throttling based on other types of source such as user agent, dashboard ID, etc.

I think a better approach is to go with #6922 and throttle heavy queries no matter its source.

Thanks for the review. I agree #6922 is the right direction to avoid hard-coding on user attributes like User-Agent or dashboard ID. That said, I see ruler vs. ad-hoc queries as a different category: it’s a system-level distinction, whereas the others are user-driven. So I don’t think this opens the door to extending based on those other characteristics.

I also recognize that ruler queries can sometimes be heavy, but that’s a rare corner case and not really the problem this feature is trying to solve. Because #6922 is more of a best-effort mechanism, I’d even suggest carrying this system-level distinction into that work as well, to avoid ruler queries being rejected in cases where precision isn’t perfect.

What do you think?

@yeya24
Copy link
Contributor

yeya24 commented Aug 17, 2025

That said, I see ruler vs. ad-hoc queries as a different category: it’s a system-level distinction, whereas the others are user-driven

I don't think we can easily identify queries coming from API as ad-hoc queries. There are different usecases that might be even more important than rules and it is all up to the user to decide the priority. Even rules can come from the query API endpoint from some remote evaluators like Thanos Ruler.

@erlan-z
Copy link
Contributor Author

erlan-z commented Aug 17, 2025

That said, I see ruler vs. ad-hoc queries as a different category: it’s a system-level distinction, whereas the others are user-driven

I don't think we can easily identify queries coming from API as ad-hoc queries. There are different usecases that might be even more important than rules and it is all up to the user to decide the priority. Even rules can come from the query API endpoint from some remote evaluators like Thanos Ruler.

Makes sense, thanks. I’m aligned with #6922. I realize it’s not always clear which queries are higher priority, and there can certainly be important API queries too, but since we can’t reliably identify those, it might help to at least avoid rejecting ruler queries, which are usually the priority ones we can identify in practice. Happy to leave this out here and revisit if needed.

@erlan-z erlan-z force-pushed the resource-based-throttling-reject-only-addhoc-queries branch from 7cfa79a to c4cfec4 Compare August 22, 2025 17:16
@erlan-z
Copy link
Contributor Author

erlan-z commented Aug 26, 2025

Udpated PR to only include change to add source for request, we can leave out resource based throttling for adhoc queries part for now.

@erlan-z erlan-z changed the title resource based throttling: reject only adhoc queries Introduced RequestSource metadata to distinguish between API and ruler queries Aug 26, 2025
…replace request source implementation on qfe when ruler calls qfe to unify experience across services

Signed-off-by: Erlan Zholdubai uulu <[email protected]>
@erlan-z erlan-z force-pushed the resource-based-throttling-reject-only-addhoc-queries branch from c4cfec4 to c87c478 Compare August 26, 2025 20:58
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@yeya24 yeya24 merged commit 43d8b30 into cortexproject:master Aug 29, 2025
18 checks passed
erlan-z added a commit to erlan-z/cortex that referenced this pull request Aug 29, 2025
…his shouldn't have been included as discussed in PR cortexproject#6947

Signed-off-by: Erlan Zholdubai uulu <[email protected]>
yeya24 pushed a commit that referenced this pull request Aug 29, 2025
…his shouldn't have been included as discussed in PR #6947 (#7000)

Signed-off-by: Erlan Zholdubai uulu <[email protected]>
danielblando pushed a commit that referenced this pull request Sep 2, 2025
* remove limiting query rejection to only adhoc queries for ingester. This shouldn't have been included as discussed in PR #6947

Signed-off-by: Erlan Zholdubai uulu <[email protected]>

* bug fixes for query rejection. fix metric and remove unused enabled flag.

Signed-off-by: Erlan Zholdubai uulu <[email protected]>

---------

Signed-off-by: Erlan Zholdubai uulu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/rules Bits & bobs todo with rules and alerts: the ruler, config service etc. size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants